Publicis Sapient | Manager-Data Engineering-Azure Interview Experience | 7+ YoE



Round 1: Technical Round - Screening

✅ Tell me about yourself and any recent projects you have been a part of.

✅ Questions related to your projects.

✅ How would you migrate data from an on-premises SQL database and stream data to Azure?

✅ Define Pipeline Creation Strategy on condition bases Scenario (2-3 condition asked)?

✅ Spark - Define Partitioning strategy (Physical & Logical)

PySpark Optimisation techniques and any moment where applied in real time.

✅ Cluster formation and resource calculation on basis of provided data, how to scale if data volume increase or decrease?

✅ How to implement Logic Apps in ADF?

✅ What types of transformations have you performed in your projects?

✅ How can you choose among groupByKey or reduceByKey?

✅ What is SCD Types, and where and how can implement it?

✅ What are the differences between Delta table, Data Lake and Warehouse?

✅ How do you read data from ADLS using SQL Pool in Synapse?

✅ What is a Managed & External Table and Materialised Views and how is it used?

✅ Code based on PySpark (for broadcast joins and left Anti) and SQL (for CTE and analytic functions)?

Round 2: Technical Round - Architecture & Coding

✅ Assign Workspace and detailed use case to prepare powerpoint presentation for high level architecture of pipeline workflow

✅ As per pipeline define Cloud and relevant services as per use cases.

✅ Same use case have some queries need to implement using python with pandas or PySpark.

Round 3: Technical Round - In Depth

✅ Architecture framework in Big Data (when to use what, pros/cons of various techs, arch. principles/guidelines)

Architecture of batch processing

Lambda architecture vs kappa architecture vs Medallion Architecture

What is DAG Scheduler? How does it work in Spark?

Difference between list & Tuple in Python

✅ write Python Code to convert String a3bc2gh into abbbbcgggh

What is window function & different type of window function?

Difference between Union & Union All? Which one is fast?

What is CAP Theorem?

Difference between Cassandra & Mongo DB in terms of technical specification.

Difference between Parquet, AVRO & ORC File Format? Why Hive don't support Parquet ?

Have you used any analytical functions?

Techniques for Query Optimization

What is Catalyst Optimizer?

What is the difference between ELT and ETL? What are the advantages & disadvantages of both? What are the challenges?

Difference between git merge and git rebase.

Difference between MongoDb & Hbase.

Different Questions with SparkSQL, API queries, architecture of Spark & Databricks.

Awarenes about the latest trends into Data Engineering & Data Mesh

✅ Difference between sensor and operators in Airflow, and how to connect Azure (with ADLS and Functions)

Round 4: Techno Managerial

✅ Tell me about yourself, the recent project you were part of, and your roles and responsibilities.

✅ How do you create a delta table in Databricks and how would you manage ACID within same?

✅ Core Principals of Company.

✅ Why do you want to join us and why should you select for this post?

✅ How will manage data governance and country based Data protection Act like HIPAA

✅ General Discussion to know about problem faced in project during development, prod rollout and how did it fix ?

✅ How did you manage team member, customers and different stakeholders and business analyst.

✅ How did you manage project details and versions of code & documents? (read about JIRA, GIT)

Round 5: HR

✅ Discussion around my experience and projects, some resume-based questions

✅ What are you expecting in your next job role & compensation?

✅ How soon can you join the company and what is my preferred location